Code Size Efficiency in Global Scheduling for VLIW/EPIC Style Embedded Processors

نویسندگان

  • Huiyang Zhou
  • Thomas M. Conte
چکیده

In embedded computing, code size is very important for system cost and performance. In global scheduling for VLIW/EPIC style embedded processors, region-enlarging optimizations, especially tail duplication, are commonly used to exploit instruction level parallelism (ILP) to boost the performance. The code size increase due to such optimizations, however, raises serious concerns about the affected I-cache, branch and TLB performance. In this paper, we focus on the code size efficiency of code size related optimizations in global scheduling. First, we propose to use the ratio of static IPC (instruction per cycle) changes to code size changes as a quantitative measure of the code size efficiency at compile time for any code size related optimization. Then, based on the code size efficiency of tail duplication, we propose the solutions to the two related problems: (1) how to achieve the best performance for a given code size increase, (2) how to get the optimal code size efficiency for any program. Our study shows that code size increase resulting from tail duplication has a significant but varying impact on IPC, e.g., the first 2% code size increase results in 18.5% increase in static IPC, while the static IPC changes less than 1% when given code size increase ranging from 20% to 30%. We then use this feature to define the optimal code size efficiency and to derive a simple, yet robust threshold scheme finding it. The experimental results using SPECint95 benchmarks show that this threshold scheme finds the optimal efficiency accurately. While the optimal efficiency results show an average increase of 2% in code size over original code size, the improved I-cache performance (4% decrease in I-cache penalty for a 32KB I-cache) is observed due to the increased locality. Overall, the optimal efficiency results show a speedup of 17% over the natural treegion (treegion without any tail duplication) results. The experiment with different I-cache sizes shows that the speedup also holds for a small I-cache of 16KB. 1 This is an expanded version of a paper presented at the INTERACT-6 workshop in conjunction with HPCA-8 on February 3, 2002. Approximately one third of this paper is new and never before published.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Design of VLIW and EPIC Instruction Formats

instruction format design, template design, instruction-set architecture, abstract ISA, concrete ISA, VLIW processors, EPIC processors, HPL-PD architecture, instruction encoding, bit allocation, affinity allocation, applicationspecific processors, design space exploration Very long instruction word (VLIW), and in its generalization, explicitly parallel instruction computing (EPIC) architectures...

متن کامل

Machine-Description Driven Compilers for EPIC and VLIW Processors

In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW processors have started to appear. Such processors typically have higher levels of instruction-level parallelism, more registers, and a relatively...

متن کامل

A Treegion-based Unified Approach to Speculation and Predication in Global Instruction Scheduling

This paper presents a treegion-based global scheduling technique for wide issue VLIW/EPIC processors. A treegion is a single-entry/multiple-exit global scheduling scope that consists of basic blocks with control-flow forming a tree. We propose a two-phase approach to global scheduling within a treegion scope that enables speculative code motion in the first phase and uses predication of all ins...

متن کامل

An Automatic System for Application-Specific Instruction Format Design and Code Generation for VLIW and EPIC processors

Introduction. Whereas the workstation and personal computer markets are rapidly converging on a small number of similar architectures, the embedded systems market is enjoying an explosion of architectural diversity. This diversity is driven by demands for higher performance at a lower cost and power consumption, and is propelled by the possibility of designing application-specific instruction-s...

متن کامل

Co-design of Compiler and Hardware Techniques to Reduce Program Code Size on a VLIW Processor

Code size is a primary concern in the embedded computing community. Minimizing physical memory requirements reduces total system cost and improves performance and power efficiency. VLIW processors rely on the compiler to statically encode the ILP in the program before its execution, and because of this, code size is larger relative to other processors. In this paper we describe the co-design of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002